Building Arabic Syntax Ontology based on Dependency Grammar Framework 


Tarik elmalki 

Abstract: Arabic Syntax Ontology is designed to represent, formally, most basic Grammatical Categories and their 
interrelationships that used in syntactic analyze of Arabic Grammar, Our ontological approach is based on two pillars ;first 
ASO linguistically is based on Dependency grammar framework analyzing sentence in term of set of labeled connection 
(Subject, object, ... ) on the other hand ASO is formally described in terms of Description Logics. 


1 .Introduction 

Arabic Syntax Ontology (ASO for short) 1 is designed to 
represent, formally, most basic Grammatical Categories 
and their interrelationships that used in syntactic analyze 
of Arabic Grammar. More than just a controlled 
vocabulary of terms used in Arabic Grammar, ASO also 
attempts to provide tools for users to infer implicit facts 
from a set of asserted axioms. Thus, via inference system 
of ASO, one can deduce from subject relationships (^1*11 
between Verb and Noun that Nominative Case 
would be assigned to Noun, and the verb would be in the 
active form (fjl*^Jl <^) , Furthermore, ASO gives a 
decision on whether its set of assertions is consistent, for 
example if we assign accusative case to name which 
occupies a subject position, then the reasoning system tell 
that such assertion is inconsistent with grammatical rules 
axiomatized in ASO. 

ASO consists of a hierarchical description of main 
grammatical concepts (Noun, Verb, Particle, Tense, Phi- 
feature...), along with descriptions of their relationships 
(Subject, Object, Circumstance...), as well as constraints 
about these concepts and relationships (Fig 1 ). So as to be 
parsed by a computer program, ASO is available in a 
machine-readable format. 
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ASO is based linguistically on Dependency grammar 
framework analyzing sentence in term of set of labeled 
connection (Subject, object, ... ) i.e: regent-dependents 
relationships holding between a words . 

In this article we extend DG to deal with functional 
categories that force us to define another type of 
connection , not mentioned in dependency grammatical 
literature, it is functional relationships (HasTense, 
HasCase...). 

After describing linguistically Traditional Arabic 
Grammars, we turn to formalize logically what we obtain 
by means of Description Logic constructors.. 

The remainder of this article is organized as follows. In 
Section 1 we introduce an overview of DL. Section 2 
provides a formal description of Classical Arabic 
Grammar based on dependency grammar approach. 

1 -Description Logic Overview 

We can identify two major objectives that DL fulfils. The 
first goal is to describe formally an application domain by 
specifying concepts (also known as classes), roles (also 
known as properties) and individuals (also known as 
objects) that are instances of these concepts. The second 
goal is provide a reasoning service which allows one to 
infer implicitly represented knowledge from the 
knowledge that is explicitly contained in the knowledge 
base. 

So a knowledge base consists of two parts: 

KB = { TBox , ABox } 

The terminological knowledge TBox (the vocabulary of 
an application domain) refers to classes of objects and 
their relationships, while the assertional knowledge ABox 
contains assertions about names individuals in terms of 
this vocabulary. 

The vocabulary of TBox consists of concepts, viewed as 
unary predicate, denote sets of individuals, and roles, 
interpreted as binary predicate, and denote relationships 
between individuals. 


Fig 1: Grammatical Categories and syntactic relation 


1 -available at http://www.arabicontology.org/ 


1-1. Syntax 

Concept descriptions in SHIQ are formed according to 
the following syntax rule: 
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if C , D are two concepts and r is a role and n is 
nonnegative integer, then ,CnD, CUD , ~C , Vr.C 
, Br.C, >nr.C and >nr.C are also concepts . 


The corresponding syntax for Web Ontology Language is 
shown in (Fig 2): 
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1-2.Semantic 
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Fig 3: 

At the other level of description, the categorical structure 
aims to provide a precise definition of each entity in term 
of attribute-value matrix which has two columns, one for 
the feature names (Number, Gender, Tense..) and the 
other for the values . For example the feature named 
Number might have either the value Singular (^jL>) or 
Plural(^^) or Dual(cP^), and the feature named Tense 
might have either Past(<^bdl) or Present(j^^l') or 

Future(L^wil). 

Each part of speech is specified with a determined 
matrix; the feature matrix for the Noun is written like 
this: 


To provide a formal meaning for the above syntax , we 
need to interpret each concept, role and individual in term 
of model which consists of two parts: a non-empty set A 7 
, (the domain of the interpretation) and an interpretation 
function /, which assigns to every atomic concept A a set 
A E A 7 and to every atomic role R a binary relation A 7 
A 7 x 


Noun 


Fsariiniii or maseulin. 
Ha&N'um'ber Siiis or Plur or Dual 
Hasp at tern PattamNama 

LiDetinite Def or Inda-ir 

HisCiia tfom or Accu or Dat 

HasXrarLE- Trans or Untrans- 
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For the Verb: 
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2- Traditional Arabic Grammars 

Traditional Arabic Grammars (TAG) can be represented 
as a triple (GC, R, AX), where GC denotes Grammatical 
Categories, R denotes a set of Syntactic Relation between 
elements of GC, AX define axioms. 

Hence, to every sentence of a language, is assigned two 
levels of syntactic description; the dependency level 
describes sentence’s structure in term of directed arcs 
which are called dependencies or connection , each of 
these arcs links a dependent to a head or regent and each 
connexion is labeled with the role of the dependent in 
relation to the head. 


Fig 5 

Feature Matrix of each lexical entity is inactive or 
unvalued, unless the entities merge with each other in the 
dependency level. Once the merging process is done, an 
appropriate value would be allocated to each feature 
name, otherwise the sentence will be ungrammatical. 
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Both levels interact with each other, for example: 
syntactic dependencies govern Case and Voice feature: 

Subject (Verb, Noun)—► HasCase(Noun,Nominative 
Case) A HasVoice (Verb , Active) 

At the other hand transitive feature governs Objective 
Dependency 

HasTranse(Verb , Transitive)(Object (Verb, Noun) —> 
HasCase(Noun, Accusative Case)). 

Some feature govern other feature as in: 

HasNumber(Noun , Dual) —> HasPattern (Noun, 
DualPatternNoun) 

2-1. Grammatical Categories: 

Grammatical Categories (GC) can be divided into two 
subcategories: Lexical Categories (LC) and Functional 
Categories (FC): 

-1- GC = LC U FC 

The term ‘Lexical Categories’ is used here to cover what 
TAG call parts of speech, scholars of Arabic Grammar 
agree that speech is divided into three main categories, 
the main classes are Nouns (N) Verbs (V) lM and 
Particles (P) 

- 2- LC = N U V U P 

Functional Categories (FC) has no thematic content like 
Tense, Aspect, Template Pattern...and includes a 
collection of functional features into which lexical units 
can be broken down. Each feature is associated with a 
function relating lexical unit with appropriate feature- 
value. 

-3- FC = Tense U Aspect U Number U Gender U 
Pattern U Transitive U Voice U Person U Definiteness 
U Case 

V Tense 

Tense is seen as a “ grammaticalisation of location in time ” 2 , 
and can be realized linguistically in two different ways : 
Lexically by adding a prefix to a verb and 

grammatically by altering the morphological Pattern of 
the verb." u4". 

Tense feature includes tree basic values; Past, Present and 
Future 

- 4- Tense = {Past, Present, Future} 

As Tense can be realized particularly on verbs, we 
conceive a function HasTense that operates uniquely on 


2 -http://www. grammaticalfeatures.net/features/tense.html 


verbs and associates appropriate Tense-value with each 
verb in the sentence. 

- 5- HasTense (Verb , Tense) 

Ex: HasTense (l$ , past) 

HasTense (J^W, Future) 

V Number 

“'Number' is a grammatical category which encodes 
quantification over entities or events denoted by nouns or 
nominal elements ” 3 

The Number feature applies to Nouns and their 
derivatives, and forms a closed set consisting of tree 
values: 

- 6- Number = {Plural, Singular, Dual} 

Number-values are assigned to Nouns via a function 
HasNumber(): 

- 7- HasNumber (Noun , Number) 

Ex: HasNumber ( l^j, Singular) 

V Gender 

Gender Feature pertains to nouns, morphologically, most 
Nouns that end with the morpheme ‘a’ are feminine, and 
there is two values for every noun : 

- 8- Gender = {Masculine, Feminine} 

Gender can be realized particularly on Nouns, thus we 
assume a function mapping each Noun to Gender-value: 

- 9- HasGender (Noun, Gender) 

Ex: HasGender (l^j, Masculine) 

HasGender (sij*l, Feminine) 

V Person 

Feature Person refers to participants in an event and 
affects verbs, a value inventory for the feature person 
includes: 

Person = { 1 st person , 2 nd person , 3 rd person } 

HasPerson (Verb , Person) 

V Definiteness 

In AG , Nouns may be either definite or indefinite : 

-10- Definiteness = {Definite, Indefinite} 

3 - http://www.grammaticalfeatures.net/features/number.html 
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And realized via a function isDefinite(): 

-11- IsDefinite (Noun, Definiteness) 

Ex: IsDefinite (J^j, Indefinite) 

Definiteness can be encoded using two ways; lexically, 
by adding a prefix J' to the beginning Noun as a prefix 
and syntactically via relating prefixless (without article 
J') to definite nouns. So the prefixless noun receive his 
Definiteness through genitive relationship , in -12 - , 

although ^ is prefixless it is seen as definite noun 
because it occurs in genitive relation with a definite noun 
jlil 


Iluj -12 - 


V PATTERN 

in Arabic language, Morphological derivation process is 
resulted from a combination of two abstract component: a 
lexical root (j^) and a specific template pattern (ujj) , 
each pattern is associated with a previous meaning in 
such way that a word’s meaning is as product of both 
level lexical and morphological pattern , thus the pattern 
denotes the one doing something , while J 
denotes passive participle onto which the action is done. 



Fig 7 


V TRANSITIVITY 

Transitive feature is formal mechanism for representing 
object required by either particular verb or certain 
derivative Nouns. For example, the noun (hitter) 

requires one object, whilst the verb (give) requires 

two objects . This is encoded in the lexical entry for the 
verb. Certain verb has no object like (sit). So we 
have three values for Transitive feature: 

-13- Transitive = (0,1,2 } 

So , HasTransitive() operates either on Noun or Verb by 
assigning to them Transitive-Value 

-14- HasTransitive (Noun,Transitive) 

Example: 


Ll; 

dirham. ACC zay dan. AC C Dei.man.NOM give .past 

Qbject2 Objectl Subject 

HasTransitive(V=V2) 

S CASE 

Case can be defined as a system of marking nouns, verbs 
and clauses that reflects the syntactic function performed 
by those components in the sentence. 

Encountered cases in TAG include four values: 

-15- Case = (Nominative, Accusative , Jussive , Dative} 

-16- HasCase(Noun,Case) 

-17- HasCase(Verb,Case) 

Ex: HasCase(d^j, Nominative) 

Ex: HasCase(^j, Accusative) 

2-2. Relationships 

R in the triple above denotes a set of Syntactic Relation 
between elements of GC, Hence a sentence S can be 
defined formally in term of ordered pair of words: 

-18- x,y E LC S = A R(x,y) 

So , the sentence (- 19-) would be analyzed into two 
relational components as shown in (- 20 -) 

-19- 


UliSl! ijll 

The apple-ACC The boy-NOM ate 

The boy ate the apple 

jUjJl 

Noun Noun Verb 


k-uhli jJjJl JS'l 

Noun Noun Verb 



Otject 


Fig 8 


- 20- S= Subject (ate ,the boy) A Object (ate ,the apple) 

i_)j* V /\ (.iljiL 

These Relations are required to satisfy the following 
axioms: 
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1- Irreflexivity 

Syntactic Relation is non-reflexive as no lexical element 
can stand in relation to itself: 

- 21- Vx G LC ~ R(x,x ) 

(JShJSi) 

—Subject (ate , ate) 

2- Asymmetric 

Given a syntactic relation R , whenever it holds between 
two words x and y , never holds in the opposite direction: 

- 22- Vx,y G LC R(x, y) ->~ R(y,x ) 

(JSl.jljl!) Je.la- <_ (4ijlUJSi) Jtli 

Subject (ate , the boy) —► ^Subject (the boy , ate) 

3- Non-transitivity 

If we consider a syntactic relation R , whenever it holds 
between both x and y and between y and z , never holds 
between x and z . 

- 23- \/x, y G LC R(x, y) A R(x, z) -> ~ R(x, z) 

4. There is unique governor x in the relation R 
which dominates y : 

-24- BlxeLC R(x,y) 

1-3.Lexical and Function Relationships 

Two types of relations must be distinguished according to 
roles that perform in the construction of sentence: Lexical 
Relation (LR) and Functional Relation (FR): 

R = LR U FR 

The former class of relation (LR) is charged with 
merging lexical categories: 

-25- V (x, y ) G LC LR(x,y) 

Whilst the latter class of relation (FR) maps a word to a 
set of feature-values (Tense, Case, Aspect, phi-feature..) 

- 26- (Vx G LC ) (3y G FC) FR (x,y) 

Thus nominative case would be assigned to word 
occurring in governed place: 

- 27- Subject (x,y)—>HasCase (y, Nominative) 

Where x , y G Lexical Categories, and Nominative is 
Case Feature 


2-3.Entity Definition 

One can summarize the inventory of feature that we need 
to describe formally TAG in the table below : 


Feature 

Value 


Gender 

Masculine 

Feminine 

FlasGender(Noun,Gender) 

Person 


FlasPerson(Verb, Person) 

Number 

Plural, 

Singular 

Dual 

FlasNumber(Noun, Number) 

Definitene 

ss 

Definite 

Indefinite 

lsDefinite(Noun, Definiteness) 

Case 

Nominative 

Accusative 

Jussive Dative 

FlasCase(Noun,Case) 

FlasCase(Verb,Case) 

Tense 

Past 

Present 

Future 

FlasTense(Verb, Tense) 

Transitive 

transitive,intr 
ansitive 

FlasT ransitive(Verb,T ransitive) 

Pattern 

VerbPattern 

NounPattern 

HasPattern(Verb,VerbPatter) 

FlasPattern(Verb,NounPatter) 


• Noun definition: 

Using constructors of Description Logics we can describe 
a Noun as : 

Noun = V HasGender.Gender n V HasNumber.Number 
n ~ V HasTense.Tense n VHasCase.Case n 
BHasTransitive.Transitive n VIsDefinite. Definiteness . 

We can define Nominative noun and accusative noun in 
the same way: 

NominativeNoun = VHasCase. Nominative n 
Noun 

AccusativeNoun = VHasCase. Accusative n 
Noun 

• Verb definition: 

Verb = ~V HasGender.Gender n~ V 
HasNumber.Number n V HasTense.Tense n 
BHasCase.Case n VHasTransitive.Transitive n V 

HasVoice.Voice n V HasMood.Mood 

• Particle Definition 

Particle = ~V HasGender.Gender n~ V 
HasNumber.Number n~ V HasTense.Tense n~ V 
HasCase.Case 

We are in position to define certain syntactic 
relationships at the dependency level , hence It follows 
from the above that certain Subject relationship can be 
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SPARQL query: 


(TBS® 


redefined as syntactic relation whose domain is verbs 
Active Participle and range is NominativeNoun 

- 28- Subject (verb , NominativeNoun ) 

In the same way we can define syntactic object: 

Object (verb, AccusativeNoun ) 

3-Using SPARQL to Query Arabic Syntax Ontology 



Fig 10 


SPARQL is a SQL-like language for querying ASO data 
which can be considered in SQL relational database terms 
as a table with three columns - the subject column, the 
predicate column, and the object column, for querying 
this table we use expressions with variables: 

Suppose we want to retrieve all possible relationships R 
that hold between nominative noun and governor X , this 
query can be captured in terms of Description Logics as 

(- 29-): 

- 29- R(X, NominativeNoun) 

The result query is shown in (Fig 9) 



Fig 9 

If we look for a specific governor for example particle 
governor we can write a query like () 

- 30- R(X, NominativeNoun) A Particle(X) 

We obtain the result (Fig 10): 
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